foundation model
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning Zachary Charles
We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions, and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions.
- North America > United States > Virginia (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models Zhimin Chen
Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoen-coder, enabling more focused attention on foreground representations.
- Asia > Middle East > Israel (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Singapore (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Public Health (0.97)
- Information Technology (0.93)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.70)
- Information Technology > Sensing and Signal Processing > Image Processing (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Learning to see the physical world: an interview with Jiajun Wu
What is your research area? My research topic, at a high level, hasn't changed much since my dissertation. It has always been the problem of physical scene understanding - building machines that see, reason about, and interact with the physical world. Besides learning algorithms, what are the levels of abstraction needed by Al systems in their representations, and where do they come from? I aim to answer these fundamental questions, drawing inspiration from nature, i.e., the physical world itself, and from human cognition.
Limitations
While our study identifies clear separations between model hypothesis classes, our best models still have not reached the consistency ceiling of the neural and behavioral benchmarks we have compared against. All models were simultaneously trained across all eight scenarios of the Physion Dynamics Training Set, constituting around 16,000 total training scenarios (2,000 scenes per scenario) [Bear et al., 2021], with a Each C-SWM [Kipf et al., 2020] model was trained on For each stimulus, we compute the proportion of "hit" responses by The Correlation to A verage Human Response is the Pearson's correlation between the model probability-hit vector and the human proportion-hit vector, across stimuli per scenario. OCP Accuracy of humans and models is the average accuracy, across stimuli per scenario. To give the final values of the two quantities, we then compute the weighted mean and s.e.m. of the above per Note that these values are therefore different for each condition, but always the same across all models. All neural predictivities are reported on heldout conditions and their timepoints.